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ABSTRACT 

We  consider  a  distributed  optimization  problem  where  n  nodes,  S_l,l\in  {l,...,n},  wish  to  minimize  a  common  strongly  convex  function  f(x), 
x  =  [x_l,...,x_n]AT,  and  suppose  that  node  S_1  only  has  control  of  variable  x_l.  The  nodes  locally  update  their  respective  variables  and 
periodically  exchange  their  values  over  noisy  channels.  Previous  studies  of  this  problem  have  mainly  focused  on  the  convergence  issue  and 
the  analysis  of  convergence  rate.  In  this  work,  we  focus  on  the  communication  energy  and  study  its  impact  on  convergence. 

In  particular,  we  study  the  minimum  amount  of  communication  energy  required  for  nodes  to  obtain  an  \epsilon-minimizer  of  f(x)  in  the 
mean  square  sense.  In  an  earlier  work,  we  considered  analog  communication  schemes  and  proved  that  the  communication  energy  must  grow 
at  the  rate  of  \Omega(\epsilonA{-l})  to  obtain  an  \epsilon-minimizer  of  a  convex  quadratic  function.  In  this  paper,  we  consider  digital 
communication  schemes  and  propose  a  distributed  algorithm  which  only  requires  communication  energy  of  0((log\epsilonA{-l})A3)  to 
obtain  an  \epsilon-minimizer  of  f(x).  Furthermore,  the  algorithm  provided  herein 

converges  linearly.  Thus,  distributed  optimization  with  digital  communication  schemes  is  significantly  more  energy  efficient  than  with 
analog  communication  schemes. 
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ABSTRACT 

We  consider  a  distributed  optimization  problem  where  n  nodes.  Si , 
l  £  {1, . . . ,  n},  wish  to  minimize  a  common  strongly  convex  func¬ 
tion  /(x),  x  =  Xn]T ,  and  suppose  that  node  Si  only  has 

control  of  variable  xi.  The  nodes  locally  update  their  respective  vari¬ 
ables  and  periodically  exchange  their  values  over  noisy  channels. 
Previous  studies  of  this  problem  have  mainly  focused  on  the  conver¬ 
gence  issue  and  the  analysis  of  convergence  rate.  In  this  work,  we 
focus  on  the  communication  energy  and  study  its  impact  on  con¬ 
vergence.  In  particular,  we  study  the  minimum  amount  of  com¬ 
munication  energy  required  for  nodes  to  obtain  an  e-minimizer  of 
/(x)  in  the  mean  square  sense.  In  an  earlier  work,  we  considered 
analog  communication  schemes  and  proved  that  the  communication 
energy  must  grow  at  the  rate  of  fl(e_1)  to  obtain  an  e-minimizer 
of  a  convex  quadratic  function.  In  this  paper,  we  consider  digital 
communication  schemes  and  propose  a  distributed  algorithm  which 
only  requires  communication  energy  of  O  ^(loge-1)  ^  to  obtain 
an  e-minimizer  of  /(x).  Furthermore,  the  algorithm  provided  herein 
converges  linearly.  Thus,  distributed  optimization  with  digital  com¬ 
munication  schemes  is  significantly  more  energy  efficient  than  with 
analog  communication  schemes. 

Index  Terms —  Distributed  optimization,  Sensor  networks,  En¬ 
ergy  constraint,  Convergence 

1.  INTRODUCTION 

Consider  a  network  of  n  nodes  which  collaborate  to  minimize  a  cost 
/(x),  x  =  [an, . . . ,  xn]T ,  where  xi  is  a  local  (vector)  variable  con¬ 
trolled  by  node  Si.  Each  node  can  perform  local  computation  and 
exchange  messages  with  a  set  of  predefined  neighbors  through  or¬ 
thogonal  noisy  channels.  Moreover,  we  assume  /(x)  has  a  certain 
“local  structure"  in  the  sense  that  its  partial  derivative  with  respect 
to  xi  only  depends  on  the  local  variables  at  node  Si  and  its  neigh¬ 
bors.  A  distributed  optimization  problem  of  this  kind  arises  naturally 
in  sensor  network  applications.  For  example,  in  the  sensor  localiza¬ 
tion  problem,  we  are  given  the  locations  of  anchor  nodes  and  dis¬ 
tance  measurements  between  certain  neighbor  nodes  in  the  network. 
The  goal  is  to  estimate  the  locations  of  all  sensors  in  the  network 
by  distributed  minimization  of  a  cost  function  /(x)  defined  by  the 
Lp  norm  of  distance  errors  [2].  In  this  context,  xi  is  the  location  of 
sensor  Si  and  is  to  be  estimated  by  Si.  To  minimize  /(x),  sensor  Si 
periodically  updates  its  local  variable,  xi,  and  exchanges  information 
with  neighbor  nodes  through  orthogonal  noisy  channels.  A  special 
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feature  of  this  problem  is  the  fact  that  nodes  are  usually  battery  op¬ 
erated  and  hence  energy-constrained.  Note  that  energy  of  each  node 
is  consumed  for  various  operations  including  local  computation  and 
inter-sensor  communication,  with  the  latter  being  the  dominant  part. 
This  motivates  us  to  study  the  minimum  amount  of  communication 
energy  required  for  distributed  optimization. 

Energy  consumption  has  not  been  a  consideration  of  algorithm 
design  in  classical  distributed  optimization  [1],  Even  recent  studies 
of  distributed  optimization  in  the  context  of  sensor  networks  [7,  4] 
have  mainly  focused  on  convergence  issues  such  as  convergence  cri¬ 
teria  and  convergence  rate.  To  the  best  of  our  knowledge,  the  most 
relevant  work  to  this  paper  is  [6]  which  studied  the  minimum  number 
of  bits  that  must  be  exchanged  between  two  nodes  in  order  to  find 
an  e-minimizer  of  /.  However,  unlike  our  current  work,  the  commu¬ 
nication  channel  is  assumed  distortion-less  in  [6],  and  there  was  no 
effort  to  characterize  minimum  energy  consumption. 

Recently,  we  considered  an  analog  communication  scheme  for 
this  distributed  optimization  problem  and  proved  that  the  communi¬ 
cation  energy  must  grow  at  the  rate  of  fl(e_1)  in  order  to  obtain  an 
e-minimizer  of  convex  quadratic  /(x)  [5].  In  this  paper,  instead,  we 
consider  digital  communication  schemes  for  a  wider  class  of  cost 
functions,  strongly  convex  function,  and  propose  a  distributed  al¬ 
gorithm  which  requires  O  ^(loge-1)  ^  communication  energy  to 

obtain  an  e-minimizer  of  /(x).  Furthermore,  the  algorithm  provided 
herein  converges  linearly  to  the  optimal  solution  as  compared  to  our 
previous  algorithm  which  has  a  sub-linear  convergence  rate.  Thus, 
digital  communication  schemes  are  far  more  energy  efficient  than 
analog  communication  schemes  for  distributed  optimization. 

2.  ALGORITHM  FRAMEWORK 

Let  IFsc,  m,l  (‘strongly  convex  functions’)  be  a  set  of  continuously 
differentiable  function  /(x)  with  the  properties 

L  ||x-y||2  <  (V/(x)  — V/(y),x  — y)  <  ML  ||x  —  y  ||2  ,  (1) 

where  V/(x)  is  gradient  vector  of  function  /(x)  at  point  x,  and  M 
and  L  are  positive  numbers. 

Consider  a  distributed  optimization  problem  where  n  nodes,  Si , 
l  €  {1, . . . ,  n},  jointly  minimize  a  common  cost  function  /(x)  G 
^sc,m,l,x  =  [x\, . . .  ,xn]T ■  Node  Si  only  has  control  of  vari¬ 
able  xi  and  has  ability  to  compute  the  partial  derivative  of  the  cost 
function  with  respect  to  its  local  variable.  Furthermore,  we  assume 
that  the  local  variable  xi,l  £  {1, . . . ,  n}  has  a  finite  range  and  is 
bounded  to  [0, 1]. 

We  assume  that  nodes  communicate  through  orthogonal  time- 
invariant  noisy  channels.  The  communication  channel  between  nodes 


Si  and  Sj  is  corrupted  by  Additive  White  Gaussian  Noise  (AWGN)  Si  at  iteration  f,  and  communication  error  due  to  channel  noise  be- 

with  power  spectral  density  Nq/2:  tween  node  Si  and  Sj  at  iteration  t ,  respectively. 


—  jk/2  . 

mij  =  d;  '.  mi  +  vi, j, 

where  rhi,j  is  the  received  message  at  node  Sj  from  node  Si,  and  vij 
is  the  AWGN.  The  signal  power  received  at  node  Sj  is  assumed  to 
be  inversely  proportional  to  where  di,j  is  the  distance  between 
nodes  Sj  and  Si,  and  k  is  the  path  loss  exponent.  We  assume  that 
energy  required  for  transmission  of  mi  is  proportional  to  the  number 
of  bits  in  the  message  (fc;).  This  is  the  case  e.g.,  if  nodes  use  M- 
QAM  or  M-PSK  modulation  to  transmit  messages.  For  example,  if 
M-QAM  is  used,  the  energy  per  bit  Wi(Pb)  is  [3]: 


Wi{Pb ) 
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where  s  is  the  number  of  bits  per  M-QAM  symbol,  Nf  is  the  receiver 
noise  figure,  Ni,j  is  the  power  spectrum  density  of  channel  noise 
between  nodes  Si  and  Sj,  Go  is  the  system  constant  defined  as  in 
[3],  and  Pb  is  the  required  bit  error  probability.  Therefore,  the  total 
communication  energy  is 


nQ,i(t)  =  sgn,(t)2  fl(t)amp,(f)  -  (x\(t)  -  ■y(t)xll(t-)^ 
ei,j{t )  =  sgni(f)2_fl(t)ampi(f) -sgni(f)2_fl(t)amp,(f) 

Then  the  variable  x 3  (f)  is  given  as 

Xi(t)  =  'y(t)x3l  —'y(t)x\(t—l)  +  nQ,i(t)+ei,j(t) 
=  x\(t)  +7 (t)  (x^t-l)  -  *|(f-l))  +nQ,i(t)+ei,j(t) 

t  t 

=  x\(t)+Y^  n  7(fc)(wQ,tO i)  +  ei,j(i)).  (4) 

i= 1  k=i+l 

B.  Local  computation  scheme:  Optimization  algorithms  in  the  pres¬ 
ence  of  noise  can  be  performed  based  on  the  gradient  projection  al¬ 
gorithm  proposed  in  [6],  One  iteration  of  this  algorithm  is  given  as 

x(t)  =  [x(f  -  1)  -  a  g(x(t  -  1))]+  ,  x(0)  =  0.  (5) 

where  [*]+  is  the  projection  of  x  on  [0, 1],  a  is  a  constant  posi¬ 
tive  step  size,  and  g(x(i))  is  a  noisy  version  of  the  gradient  vector 
V/(x(t)).  For  /(x)  £  Tsc,m,l ,  it  has  been  proven  in  [6]  that  if 
g(x(t))  satisfies 


t  n 

Ecom  (f )  =  £)£}Wi(ft(0)M0- 

i=  1  i=l 

This  paper  aims  to  study  the  minimum  communication  energy 
required  to  obtain  an  e-minimizer  of  /(x)  in  the  mean  square  sense. 
A  point  x  is  an  e-minimizer  of  /(x)  in  the  mean  square  sense  if 

E[||x  —  x* || 2]  <  e,  x*  =  argmin /(x). 

X 

Here,  we  introduce  a  distributed  algorithm  in  which  node  Si  iter¬ 
atively  updates  its  local  variable  xi  and  tracks  the  variables  of  its 
neighbors.  The  algorithm  consists  of  two  parts:  A  digital  communi¬ 
cation  scheme  and  a  local  computation  scheme  at  each  node. 

A.  Communication  scheme:  After  each  local  update,  node  Si  should 
relay  its  information  to  the  other  nodes.  One  way  is  to  directly 
send  the  quantized  version  of  the  updated  value  of  its  local  variable, 
x\{t  +  1),  which  requires  transmitting  more  bits,  and  thus  consum¬ 
ing  more  energy,  as  algorithm  proceeds.  An  alternative  way  is  to 
send  the  quantized  version  the  incremental  value,  x\(t  +  1)  —  x\{t), 
in  which  case  requires  transmitting  a  constant  number  of  bits  at  each 
iteration  but  may  not  guarantee  the  convergence  of  the  algorithm  due 
to  communication  error.  In  general,  we  can  consider  a  linear  mes¬ 
saging  scheme  where  the  transmitted  signal,  m;(t),  is  given  as 

mi(t)  =  Q(x\(t)  -~f(t)xli(t-  1))  =  [sgn, (t),  amp, (i)]  (2) 

where  sgnj(f)  represents  the  sign  of  x\(t)  —  7(f)Xj(f  —  1),  and 
amp;  (t)  is  the  binary  representation  of  integer  part  of  2Rt't'>\x\  ( t )  — 
7(f)*;  (f  —  1)|.  The  resolution  of  quantizer,  R(t),  is  a  design  pa¬ 
rameter  and  will  be  determined  later.  Receiver  node  Sj  first  detects 
fhi(t)  =  [sgH;(f),  am)3;(t)]  and  then  reconstructs  *;  as  following 

(xi  (0)  =  0) 

x](t)  =  7(f)*f(f-  1) +  sgn;(f)2~fl(‘)amp;(f).  (3) 

Notice  that  variable  xj  (t)  is  a  noisy  copy  of  node  Si ’s  variable  *;  (t ) 
at  node  Sj.  Define  nQ,i(t),  and  eij(t)  as  quantization  noise  at  node 


||g(x(f))  -  V/(x(f))  ||  <  n1/2a‘,  t  =  l,...,  (6) 

for  a  <  1  sufficiently  close  to  1,  then  the  sequence  (x(f)}  generated 
by  the  gradient  projection  algorithm  (5)  converges  linearly  to  the 
optimal  point. 

We  consider  a  distributed  implementation  of  the  gradient  projec¬ 
tion  algorithm  whereby  Sj ,  j  £  { 1 , . . . ,  n}  uses  the  noisy  copy  of  its 
neighbors  variables  to  estimate  gj(x),  the  partial  derivative  of  /(x) 
with  respect  to  its  local  variable  x3 ,  and  to  update  x3.  as  (*j(0)  =  0) 


*1(0  = 


-  1)  -  Ij  (xi(t), a£(0) 


(7) 


Here,  we  have  chosen  a  =  ^  L .  In  the  next  section,  we  first  derive  a 
sufficient  condition  on  the  coefficient  7(f),  the  resolution  R(t),  and 
probability  of  bit  error  Pb{t)  to  obtain  an  e-minimizer  of  /(x).  We 
then  bound  the  total  communication  energy. 


3.  CONVERGENCE  CONDITION  AND  COMMUNICATION 
ENERGY 

Letx(f)  =  [x\(t), ...,  x"(t)]T  and  define  g(x(f))  as  a  vector  of  the 
partial  derivatives  of  /(x)  with  respect  to  the  local  variables  which 
are  computed  locally  at  each  node  using  the  noisy  replicas  of  neigh¬ 
bors  variables: 


g(x(t))  =  [51  ■  ,xi{t))  ,...,gn  (■ x”(t .,x™(t))]T. 

Then,  the  distributed  computation  (7)  can  be  expressed  as  the  gradi¬ 
ent  projection  algorithm 


x(t) 


x(f  -  1) 


1 

LM 


g(x(t-  1)) 


(8) 


Notice  that  the  convergence  condition  (6)  does  not  satisfy  for  every 
realization  of  channel  noise.  Therefore,  the  sequence  (x(f)}  gener¬ 
ated  by  (8)  might  not  converge  to  the  optimal  point.  However,  under 
a  modified  assumption,  the  sequence  (x(f)}  converges  to  the  opti¬ 
mal  point  x*  in  the  mean  squared  sense  using  the  following  lemma: 


Lemma  1  For /(x)  £  Psc,m,l,  ifg(x.(t))  satisfies 

E  [l|g(xW)  -  V/(x(t))||2]  <  na2t,  t  =  1,...,  (9) 

where 

(l-^)  +  3v/(^)<a2<l,  (10) 

then  the  sequence  {x(t)}  generated  by  the  gradient  projection  al¬ 
gorithm  (8)  converges  linearly  to  the  optimal  point  x*  in  the  mean 
squared  sense  such  that 
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where 


The  proof  of  Lemma  1  is  similar  to  the  proof  of  proposition  5.1  in 
[6]  and  omitted  here  for  lack  of  space.  We  use  Lemma  1  to  show 
that  the  proposed  algorithm  generates  an  e-minimizer  of  /(x)  if  the 
design  parameters  are  chosen  as 

7(f)  =  a, 


m  = 


i°g2 
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where  [a:]  is  the  smallest  integer  number  greater  than  a;,  a  is  a  pos¬ 
itive  constant  less  than  1,  and  log^*)  is  the  logarithm  base  y  of 
x. 

Theorem  1  If  the  design  parameters ,  7(f),  R(t),  and  Pb(t)  sat¬ 
isfy  (11)-(13),  then  the  distributed  algorithm  described  by  (2),  (3), 
and  (7)  obtains  an  e-minimizer  of  /(x)  £  Tsc,m,l  in  the  mean 
square  sense.  Moreover,  the  communication  energy  to  obtain  an  e- 
minimizer  of  /(x),  x(fe),  is 


Ecorn(tf)  =  O  ^(loge-1)3^  . 


Proof:  We  first  show  that  the  convergence  condition  (9)  of  Lemma  1 
holds.  Therefore,  the  proposed  algorithm  obtains  an  e-minimizer  of 
/(x)- 

For  /(x)  £  Tsc,m,l ,  It  follows  form  (1)  that 

||V/(x)-V/(x)||  <ML||x-x||. 

Therefore, 

E[||g(x(f))-V/(x(t))||2]<MLE 


E  (*!J  (*)-*!(*)) 

1=1 

mlJ2 e  (®f(t) -*{(*)) 2 


.(14) 


where  ,7  is  given  as 


J  =  argmax 


\x\ {t)  -  x\(t), . .  .,x3n{t )  -  z"(f) 


Substitute  the  difference  between  variable  x[  and  its  noisy  copy  at 
node  Sj,  xf ,  from  equation  (4),  and  use  the  fact  that  the  quantization 
noise  and  communication  error  are  uncorrelated  to  obtain 

t  t 


E 


xf  (t)-x‘(tj) 


=E 


in  7  {k)(nQ,i(i)+ei,j(ij) 

i=lk=i+l 


The  power  of  quantization  noise  E  [tiq  i  (f)l  is  upper-bounded  by 


E  [nQ,l(*)]  <  2 


—  2R(i) 


(16) 


where  R(i)  is  resolution  of  quantizer.  Notice  that  the  number  of 
transmitted  bits  at  iteration  i  is  bounded  by  R(i).  Therefore,  the 
power  of  communication  error  E  [ef  j(«)]  is  bounded  by 


R(i) 


E  [el j(i)\  <  J2  Pb(i)2~2q  <  (17) 


9=0 


Substitute  (15)-(17)  into  (14)  to  derive 


E||g(x(f))-V/(x(f))||" 

t  t  , 

<  nMLj^  EM* 

i=lk=i+l  ' 


-2  R(i) 


+ 


4  Pb(i) 


It  follows  from  Lemma  1  that  the  proposed  algorithm  obtains  an  e- 
minimizer  of  /(x)  if 


E 


+  ■jPbjt)  ^ 

n  7z(fc) 


< 


,t  =  1, - (18) 


ML  J]  l2{k) 


Notice  the  left-hand  side  of  the  inequality  (18)  is  a  non-decreasing 
function  of  iteration  number  t.  Hence,  the  right-hand  of  the  inequal¬ 
ity  (18)  should  also  be  a  non-decreasing  function  of  iteration  num¬ 
ber.  Therefore,  we  have  a2 /')2{f)  >  1,  and  thus  7(f)  <  a.  Let 
7(f)ct.  Then,  to  prove  the  convergence  of  the  proposed  algorithm,  it 
is  enough  to  show  that 


J>-2 


—  2icy  —  2R(i) 


< 


E Q  ^ 


1 

2 ML’ 


8  ML' 


(19) 

(20) 


Recall  the  resolution  R(i)  from  (12)  and  notice  that  a  <  1. 
Therefore,  the  inequality  (19)  holds  as 


E 


1  t 

1  —  a  ir-^  2  i  - 

<  -  >  a  < 

~  2 LAI  ^ 


1 

2LM ' 


Similarly,  the  inequality  (20)  holds  for  the  choice  of  the  bit  error 
probability  Pb(i)  at  (13).  Therefore,  the  convergence  condition  (9) 
is  satisfied  and  the  algorithm  converges  linearly  to  the  optimal  point 
in  the  mean  squared  sense  (Lemma  1).  In  particular,  the  algorithm 
obtains  an  e-minimizer  of  /(x)  at  iteration  fe,  if 


te 


> 


2l0^{2 


(21) 


So  far,  we  have  shown  that  the  proposed  algorithm  obtains  an 
e-minimizer  of  /(x).  Next,  we  derive  the  communication  energy  in 
terms  of  e.  Recall  that  the  total  communication  energy  is 


=E  Urm  [nQ,K*)]+E  [e?,j (*)])•(  15) 

i=  1  k=i+ 1 


^w  =  EE  Wi(pb(i))bi{i). 


Fig.  1.  Mean  squared  error 
Notice  that  the  communication  energy  per  bit  is 

Wl(Pb(i))  =  wl  log  (4(1sA^}  2  ^  =  ^1  log a~\+0  (1) ,  (22) 

and  the  number  of  transmitted  bits  at  iteration  i  bi(i)  is  bounded  by 
R(i )  +  1.  Substitute  the  resolution  R(i)  from  (12),  the  communica¬ 
tion  energy  per  bit  Wi{Pb{i))  from  (22)  to  obtain 

te 

Ecom(te)  <  (ci*2  +  0(0)  =  +  Oitl),  (23) 

i=l 

where  constant  Ci  is  equal  to  8 n  max  wi  log  a^1  log2  a-1 .  Replace 
the  iteration  number  from  (21)  into  (23)  to  derive 

ECOm(U)  =  c2  (loge^1)3  +  O  (loge-1)2  , 

Cl 

where  constant  constant  C2  is  equal  to - «  •  The  theorem 

24  (log  a"1)3 

has  been  proven.  ■ 

4.  SIMULATION  RESULTS 

To  illustrate  the  concept,  we  consider  a  simple  example  where  10 
nodes  minimize  a  quadratic  convex  function  /(x)  =  ^xTQx  + 
brx+c,  where  Q  G  i?10xl0,Q  >-  0.  Notice  that /(x)  G  Tsc,m, l 
where  M  and  L  are  the  condition  number  and  minimum  eigenvalue 
of  matrix  A,  respectively.  We  generate  a  random  matrix  A  such  that 
/(x)  G  Esc, 2,1,  and  select  vector  b  such  that  /(x)  has  a  minimum 
point  at  x  =  [0.5, . . . ,  0.5]T.  Since  all  coefficients  un  are  scaled 
by  a  common  factor.  In  simulation,  wi  are  taken  to  be  maximum  of 
channel  path  loss  wi  =  maxj  dfj ,  k  =  2.  The  mean  square  error  for 
different  value  of  a  averaged  over  10  runs  is  shown  in  Figure  1.  This 
figure  confirms  that  the  algorithm  converges  linearly  to  the  optimal 
point.  Notice  that  the  choice  of  a  should  satisfy  (10).  In  particular, 
a  >  -J 1  —  1/M  =  v/2/2.  Figure  2  shows  that  the  communication 
energy  required  to  obtain  e-minimizer,  normalized  by  C2  (log  e-1)3, 
is  bounded.  This  figure  also  shows  the  normalized  communication 
energy  for  the  worst  case  scenario  where  R(i)  +  1  number  of  bits  is 
transmitted  by  each  node  at  ith  iteration.  These  results  agree  with 
our  theoretical  analysis. 


Fig.  2.  Normalized  communication  energy 

5.  CONCLUSIONS  AND  FUTURE  WORK 

We  studied  the  problem  of  distributed  optimization  of  a  strongly  con¬ 
vex  function  in  an  energy-constrained  network.  We  considered  a 
class  of  distributed  gradient  projection  algorithm  implemented  us¬ 
ing  certain  digital  messaging  schemes.  We  proposed  an  algorithm 

which  requires  communication  energy  of  order  O  ^(loge-1)3^  to 
obtain  an  e-minimizer.  Furthermore,  this  algorithm  converges  lin¬ 
early  to  the  optimal  solution.  Our  numerical  simulations  confirmed 
our  theoretical  analysis. 
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